random survival forest
The Impact of Medication Non-adherence on Adverse Outcomes: Evidence from Schizophrenia Patients via Survival Analysis
Noroozizadeh, Shahriar, Welle, Pim, Weiss, Jeremy C., Chen, George H.
This study quantifies the association between non-adherence to antipsychotic medications and adverse outcomes in individuals with schizophrenia. We frame the problem using survival analysis, focusing on the time to the earliest of several adverse events (early death, involuntary hospitalization, jail booking). We extend standard causal inference methods (T-learner, S-learner, nearest neighbor matching) to utilize various survival models to estimate individual and average treatment effects, where treatment corresponds to medication non-adherence. Analyses are repeated using different amounts of longitudinal information (3, 6, 9, and 12 months). Using data from Allegheny County in western Pennsylvania, we find strong evidence that non-adherence advances adverse outcomes by approximately 1 to 4 months. Ablation studies confirm that county-provided risk scores adjust for key confounders, as their removal amplifies the estimated effects. Subgroup analyses by medication formulation (injectable vs. oral) and medication type consistently show that non-adherence is associated with earlier adverse events. These findings highlight the clinical importance of adherence in delaying psychiatric crises and show that integrating survival analysis with causal inference tools can yield policy-relevant insights. We caution that although we apply causal inference, we only make associative claims and discuss assumptions needed for causal interpretation.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Montana (0.04)
- North America > Canada (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Enhancing Visual Interpretability and Explainability in Functional Survival Trees and Forests
Loffredo, Giuseppe, Romano, Elvira, MAturo, Fabrizio
Functional survival models are key tools for analyzing time-to-event data with complex predictors, such as functional or high-dimensional inputs. Despite their predictive strength, these models often lack interpretability, which limits their value in practical decision-making and risk analysis. This study investigates two key survival models: the Functional Survival Tree (FST) and the Functional Random Survival Forest (FRSF). It introduces novel methods and tools to enhance the interpretability of FST models and improve the explainability of FRSF ensembles. Using both real and simulated datasets, the results demonstrate that the proposed approaches yield efficient, easy-to-understand decision trees that accurately capture the underlying decision-making processes of the model ensemble.
- North America > United States (0.14)
- Europe > Italy > Lazio > Rome (0.04)
- Europe > Italy > Campania (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.66)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Ensemble Survival Analysis for Preclinical Cognitive Decline Prediction in Alzheimer's Disease Using Longitudinal Biomarkers
Ghosh, Dhrubajyoti, Pal, Samhita, Lutz, Michael, Luo, Sheng
Predicting the risk of clinical progression from cognitively normal (CN) status to mild cognitive impairment (MCI) or Alzheimer's disease (AD) is critical for early intervention in Alzheimer's disease (AD). Traditional survival models often fail to capture complex longitudinal biomarker patterns associated with disease progression. We propose an ensemble survival analysis framework integrating multiple survival models to improve early prediction of clinical progression in initially cognitively normal individuals. We analyzed longitudinal biomarker data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, including 721 participants, limiting analysis to up to three visits (baseline, 6-month follow-up, 12-month follow-up). Of these, 142 (19.7%) experienced clinical progression to MCI or AD. Our approach combined penalized Cox regression (LASSO, Elastic Net) with advanced survival models (Random Survival Forest, DeepSurv, XGBoost). Model predictions were aggregated using ensemble averaging and Bayesian Model Averaging (BMA). Predictive performance was assessed using Harrell's concordance index (C-index) and time-dependent area under the curve (AUC). The ensemble model achieved a peak C-index of 0.907 and an integrated time-dependent AUC of 0.904, outperforming baseline-only models (C-index 0.608). One follow-up visit after baseline significantly improved prediction accuracy (48.1% C-index, 48.2% AUC gains), while adding a second follow-up provided only marginal gains (2.1% C-index, 2.7% AUC). Our ensemble survival framework effectively integrates diverse survival models and aggregation techniques to enhance early prediction of preclinical AD progression. These findings highlight the importance of leveraging longitudinal biomarker data, particularly one follow-up visit, for accurate risk stratification and personalized intervention strategies.
- North America > United States > California (0.28)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > North Carolina (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data
Kim, Sehwan, Wang, Rui, Lu, Wenbin
Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data Sehwan Kim 1, Rui Wang 1,2, and Wenbin Lu 3 1 Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA 2 Department of Biostatistics, Harvard School of Public Health, Boston, MA 3 Department of Statistics, North Carolina State University, Raleigh, NC March 13, 2025 Abstract In survival analysis, estimating the conditional survival function given predictors is often of interest. There is a growing trend in the development of deep learning methods for analyzing censored time-to-event data, especially when dealing with high-dimensional predictors that are complexly interrelated. Many existing deep learning approaches for estimating the conditional survival functions extend the Cox regression models by replacing the linear function of predictor effects by a shallow feed-forward neural network while maintaining the proportional hazards assumption. Their implementation can be computationally intensive due to the use of the full dataset at each iteration because the use of batch data may distort the at-risk set of the partial likelihood function. To overcome these limitations, we propose a novel deep learning approach to non-parametric estimation of the conditional survival functions using the generative adversarial networks leveraging self-consistent equations. The proposed method is model-free and does not require any parametric assumptions on the structure of the conditional survival function. We establish the convergence rate of our proposed estimator of the conditional survival function. In addition, we evaluate the performance of the proposed method through simulation studies and demonstrate its application on a real-world dataset. 1 Introduction Censored time-to-event data are widely encountered in various fields where understanding the timing of events, such as failure rates or disease progression, is critical, but the exact event times Correspondence author: Wenbin Lu, email: wlu4@ncsu.edu 1 arXiv:2503.09097v1 For example, estimating survival probability based on covariate information is essential for risk prediction, which plays a key role in developing and evaluating personalized medicine. The Kaplan-Meier (KM) estimator (Kaplan and Meier, 1958), Cox proportional hazards model (Cox, 1972), and random survival forests (Ishwaran et al., 2008) are commonly-used methods for estimating survival functions. The KM estimator is a non-parametric method suitable for population-level analyses. However, its utility is limited when the objective is to estimate conditional survival probabilities at the individual level. The Cox proportional hazards model offers a semi-parametric approach for estimating conditional survival functions, accommodating the incorporation of covariates.
- North America > United States > Massachusetts > Suffolk County > Boston (0.44)
- North America > United States > North Carolina > Wake County > Raleigh (0.24)
- Asia > Singapore (0.04)
- Law > Civil Rights & Constitutional Law (1.00)
- Health & Medicine (1.00)
Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data
Graf, Ricarda, Todd, Susan, Baksh, M. Fazil
The Cox proportional hazards model is often used for model development in data from randomized controlled trials (RCT) with time-to-event outcomes. Random survival forests (RSF) is a machine-learning algorithm known for its high predictive performance. We conduct a comprehensive neutral comparison study to compare the predictive performance of Cox regression and RSF in real-world as well as simulated data. Performance is compared using multiple performance measures according to recommendations for the comparison of prognostic prediction models. We found that while the RSF usually outperforms the Cox model when using the $C$ index, Cox model predictions may be better calibrated. With respect to overall performance, the Cox model often exceeds the RSF in nonproportional hazards settings, while otherwise the RSF typically performs better especially for smaller sample sizes. Overall performance of the RSF is more affected by higher censoring rates, while overall performance of the Cox model suffers more from smaller sample sizes.
- North America > United States > Massachusetts (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > Germany (0.04)
- Europe > United Kingdom (0.04)
Random Survival Forest for Censored Functional Data
Romano, Elvira, Loffredo, Giuseppe, Maturo, Fabrizio
This paper introduces a Random Survival Forest (RSF) method for functional data. The focus is specifically on defining a new functional data structure, the Censored Functional Data (CFD), for dealing with temporal observations that are censored due to study limitations or incomplete data collection. This approach allows for precise modelling of functional survival trajectories, leading to improved interpretation and prediction of survival dynamics across different groups. A medical survival study on the benchmark SOFA data set is presented. Results show good performance of the proposed approach, particularly in ranking the importance of predicting variables, as captured through dynamic changes in SOFA scores and patient mortality rates.
- North America > United States > New York (0.04)
- Europe > Italy > Campania (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.66)
- Health & Medicine (1.00)
- Law > Civil Rights & Constitutional Law (0.92)
A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data
Burk, Lukas, Zobolas, John, Bischl, Bernd, Bender, Andreas, Wright, Marvin N., Sonabend, Raphael
This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.
- Europe > Germany > Bremen > Bremen (0.14)
- North America > United States > Wyoming > Albany County > Laramie (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (11 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Law > Civil Rights & Constitutional Law (0.76)
- Health & Medicine > Therapeutic Area > Hematology (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Explainable AI for survival analysis: a median-SHAP approach
Ter-Minassian, Lucile, Ghalebikesabi, Sahra, Diaz-Ordaz, Karla, Holmes, Chris
With the adoption of machine learning into routine clinical practice comes the need for Explainable AI methods tailored to medical applications. Shapley values have sparked wide interest for locally explaining models. Here, we demonstrate their interpretation strongly depends on both the summary statistic and the estimator for it, which in turn define what we identify as an 'anchor point'. We show that the convention of using a mean anchor point may generate misleading interpretations for survival analysis and introduce median-SHAP, a method for explaining black-box models predicting individual survival times.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Area-norm COBRA on Conditional Survival Prediction
Goswami, Rahul, Dey, Arabin Kr.
The paper explores a different variation of combined regression strategy to calculate the conditional survival function. We use regression based weak learners to create the proposed ensemble technique. The proposed combined regression strategy uses proximity measure as area between two survival curves. The proposed model shows a construction which ensures that it performs better than the Random Survival Forest. The paper discusses a novel technique to select the most important variable in the combined regression setup. We perform a simulation study to show that our proposition for finding relevance of the variables works quite well. We also use three real-life datasets to illustrate the model.
- Law (0.71)
- Health & Medicine > Therapeutic Area (0.47)
Random survival forests with multivariate longitudinal endogenous covariates
Devaux, Anthony, Helmer, Catherine, Genuer, Robin, Proust-Lima, Cécile
Predicting the individual risk of a clinical event using the complete patient history is still a major challenge for personalized medicine. Among the methods developed to compute individual dynamic predictions, the joint models have the assets of using all the available information while accounting for dropout. However, they are restricted to a very small number of longitudinal predictors. Our objective was to propose an innovative alternative solution to predict an event probability using a possibly large number of longitudinal predictors. We developed DynForest, an extension of competing-risk random survival forests that handles endogenous longitudinal predictors. At each node of the tree, the time-dependent predictors are translated into time-fixed features (using mixed models) to be used as candidates for splitting the subjects into two subgroups. The individual event probability is estimated in each tree by the Aalen-Johansen estimator of the leaf in which the subject is classified according to his/her history of predictors. The final individual prediction is given by the average of the tree-specific individual event probabilities. We carried out a simulation study to demonstrate the performances of DynForest both in a small dimensional context (in comparison with joint models) and in a large dimensional context (in comparison with a regression calibration method that ignores informative dropout). We also applied DynForest to (i) predict the individual probability of dementia in the elderly according to repeated measures of cognitive, functional, vascular and neuro-degeneration markers, and (ii) quantify the importance of each type of markers for the prediction of dementia. Implemented in the R package DynForest, our methodology provides a novel and appropriate solution for the prediction of events from any number of longitudinal endogenous predictors.
- North America > United States > New York (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)